Migrate Nav2 timers & rates to use steady clocks and be sim-time-aware #6063
Migrate Nav2 timers & rates to use steady clocks and be sim-time-aware #6063tonynajjar wants to merge 30 commits intoros-navigation:mainfrom
Conversation
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
|
To be tested! I opened the PR to see if I find obvious issues in CI first |
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
… update sleep mechanism for interruptibility Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
|
One risk I see with this PR: The default clock type of By replacing However for I do think that e.g. controller loop should be using RCL_STEADY_TIME i.e. it shouldn't be affected by e.g. NTP changes. What do you think? |
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
There was a problem hiding this comment.
Especially for the controller server, costmap rates, can you comment on the previous discussions on this subject why we left them the way that they are?
I do think that e.g. controller loop should be using RCL_STEADY_TIME i.e. it shouldn't be affected by e.g. NTP changes. What do you think?
This is a key part of that, I think. As well as for many algorithms that cannot/should not be run at multiples of rate in simulation due to inability to realistically ever keep up - causing incorrect behaviors when running at > 1x speed which are not deterministic at 1x speed.
I mentioned in the ticket that #3325 (comment) is a good summary of the current state
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
Regarding your previous conclusion here:
I mean it highly depends on the computational complexity of the plugins and the machine's specs no? Without hard numbers yet, but it seems like I can run MPPI on 10 Hz at 5x speed on my Intel(R) Core(TM) Ultra 9 285H.
In case you remember, would be helpful to share more details so I can try to reproduce. The setup I tested so far is quite a simple one with loopback simulation
For this, it's not a blocker, we should just find out a way to switch between "/clock time" and RCL_STEADY_TIME instead of RCL_SYSTEM_TIME (As mentioned, a RCL_ROS_TIME clock will currently choose between "/clock time" and RCL_SYSTEM_TIME according to |
|
I don't recall the specific issues I ran into from that long ago unfortunately. I do remember frustrating number of system tests became flaky though. I'll rerun CI a few times and see if that's still the case here.
On this point for TimedBehavior, ControllerServer, Costmap2DROS, VelocitySmoother, and WaypointFollower: I generally agree. However, I think that the following is too brute force in the application code I don't think we can set the clock type when not simulation time in the NodeOptions unfortunately. Maybe though something to consider adding? The next best thing would be to create a Are there any cases for Rate we want to use NTP corrections? I kind of doubt it. Maybe missed a few?
Given the BT.CPP changes required, can we change this to a draft until that is merged by Davide? |
it's merged already :D We still need to keep BT.CPP in the vendor.repos until next bloom release I presume |
|
btw looks like the Jazzy and Kilted workflows don't build the underlay from vendor.repos? Does that mean that if this get merged the workflows will remain broken? Is that new to you? |
|
Yes, and that is technically expected. I'm on the fence about it, but tentatively going to update the job to use the repos file for now. The main issue is that what happens if changes are made to a key dependency like BT.CPP which are not backward compatible and cannot/will not be updated in Humble or Jazzy. Our CI works for those distributions, but users themselves cannot do it without also updating something like BT.CPP which from v3 to v4 may fundamentally not work with their existing code. That's not an issue today since Davide appears to release v4 up to date in every ROS distribution, so I'll kick this can down the road for another time if this becomes an issue. Here's a PR to resolve: #6069 |
nav2_behavior_tree/include/nav2_behavior_tree/utils/loop_rate.hpp
Outdated
Show resolved
Hide resolved
|
ros2/rclcpp#3122 made an issue for rclcpp but will will need to find a local solution in the meantime - I'm working on that |
… nav2::Rate and nav2::create_timer for improved simulation time support Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
…reate_timer and nav2::Rate for improved simulation time handling Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
|
Actually in terms of changes that's pretty much it so you can take a look already. Just didn't have the time to make a nice description but basically, with the help of some custom Nav2 wrappers of |
There was a problem hiding this comment.
Otherwise the code that's changed LGTM.
I will say though that I need to look at this again with fresh eyes. Does this imply that we should actually update all Rate's or Timers to use this as well? Is there a downside to that in any case? I do like the idea of abstracting even more rclcpp::'s into nav2::'s but only if that's really the right answer.
nav2_behavior_tree/include/nav2_behavior_tree/utils/loop_rate.hpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Updates Nav2 timing primitives to be robust to system clock jumps and simulation-time execution by selecting an appropriate clock (ROS/sim clock when use_sim_time=true, otherwise steady/monotonic time) for rates and timers.
Changes:
- Added
nav2::selectClock()andnav2::WallRateand migrated several rate-limited loops to use it. - Added sim-time-aware
create_wall_timerhelpers (free function +nav2::LifecycleNodeshadow) and migratedLifecycleManagertimers. - Updated BT execution loop (
LoopRate+BehaviorTreeEngine) to use an injected clock and a wakeup-driven wait loop.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/underlay.repos | Pins BehaviorTree.CPP to a specific commit to pick up Tree::wakeUpSignal() support. |
| nav2_waypoint_follower/src/waypoint_follower.cpp | Migrates waypoint follower loop rate to nav2::WallRate. |
| nav2_ros_common/include/nav2_ros_common/wall_rate.hpp | Introduces nav2::selectClock() and nav2::WallRate for sim-time-aware rate limiting. |
| nav2_ros_common/include/nav2_ros_common/lifecycle_node.hpp | Shadows create_wall_timer() to use the selected clock for lifecycle nodes. |
| nav2_ros_common/include/nav2_ros_common/interface_factories.hpp | Adds nav2::create_wall_timer() helper using clock selection. |
| nav2_lifecycle_manager/src/lifecycle_manager.cpp | Migrates timers (init_timer_, bond timers) to nav2::create_wall_timer(). |
| nav2_costmap_2d/src/costmap_2d_ros.cpp | Migrates costmap update loop to nav2::WallRate. |
| nav2_controller/src/controller_server.cpp | Migrates controller compute loop to nav2::WallRate. |
| nav2_behaviors/include/nav2_behaviors/timed_behavior.hpp | Migrates behavior cycle loop to nav2::WallRate. |
| nav2_behavior_tree/test/plugins/action/test_bt_action_node.cpp | Updates test to new LoopRate constructor signature (explicit clock). |
| nav2_behavior_tree/src/behavior_tree_engine.cpp | Injects selected clock into BT loop rate handling. |
| nav2_behavior_tree/include/nav2_behavior_tree/utils/loop_rate.hpp | Reworks sleeping to poll an injected clock using BT wakeup signal. |
| nav2_behavior_tree/include/nav2_behavior_tree/behavior_tree_engine.hpp | Adds rate_clock_ member to support sim/steady timing for BT loop. |
| nav2_amcl/src/amcl_node.cpp | Minor include cleanup and explicitly calls this->create_wall_timer(). |
nav2_behavior_tree/include/nav2_behavior_tree/utils/loop_rate.hpp
Outdated
Show resolved
Hide resolved
| template<typename NodeT> | ||
| rclcpp::Clock::SharedPtr selectClock(NodeT node) | ||
| { | ||
| bool use_sim_time = false; | ||
| auto params = node->get_node_parameters_interface(); | ||
| if (params->has_parameter("use_sim_time")) { | ||
| use_sim_time = params->get_parameter("use_sim_time").as_bool(); | ||
| } | ||
| if (use_sim_time) { | ||
| return node->get_clock(); | ||
| } | ||
| return std::make_shared<rclcpp::Clock>(RCL_STEADY_TIME); | ||
| } |
There was a problem hiding this comment.
New clock-selection behavior (selectClock) is core to the PR’s sim-time/steady-time guarantees, but there are no unit tests validating the selection logic (e.g., use_sim_time=true returns the node clock, false returns a steady clock) and that WallRate uses the chosen clock. Adding focused tests in nav2_ros_common/test would help prevent regressions, especially around parameter overrides and lifecycle state transitions.
| * @return The appropriate clock | ||
| */ | ||
| template<typename NodeT> | ||
| rclcpp::Clock::SharedPtr selectClock(NodeT node) |
There was a problem hiding this comment.
Topic Naming: I think we should be specific as to what clock this means / used for. Maybe selectSteadyClock or something? Not sure steady is technically correct for simulation time though
There was a problem hiding this comment.
fair, I renamed to selectSteadyOrSimClock. I know verbose but given that this function is not meant to be widely used (and temporary), I figured better clear than concise
|
Otherise just this topic:
|
I'm going to use the terminology in ros2/rclcpp#3122 (comment) For rates and timers, I think always RCL_ROS_STEADY_TIME is wanted over RCL_ROS_TIME or RCL_SYSTEM_TIME but probably there are a few places, e.g. in tests or maybe rviz panels, where we need RCL_STEADY_TIME (not sim-time aware). I would need to do a deep dive |
|
I think as part of this, we should, since its an easy grep + replace if it turns out there's no real reason to use the other now. |
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
|
Ok I did another pass and found more
I'm torn because they all have some drawbacks but on the other hand whatever we choose is a (hopefully) temporary option until ros2/rclcpp#3122 in implemented. Because of that, I tend towards option 1 |
|
Are they all tests? If so, then I think we can convert them since NTP jumps don't matter for that really. I think we may be able to also rename the |
Yes, the question is with which method from those listed above |
Hmm so maybe nav2::Rate is our RCL_ROS_STEADY_TIME and nav2::WallRate is RCL_STEADY_TIME. It's diverging quite a bit from rclcpp, I hope it's not too confusing... |
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
|
|
I mean switch everything to use the nav2 wall rate but change the name to drop "wall" if we're using it globally. Just 1 object. |
…d Rate for consistency Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
Yes but we need a secondary entity to represent RCL_STEADY_TIME I think I found a nice solution, here is the current state Rates
Timers
What Is Used WhereProduction code
Tests
|
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
… tree and lifecycle node Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
|
@SteveMacenski anything remaining here? |
These changes make timer and rate-related code more robust, especially when switching between simulation and real time, and help prevent issues related to system clock jumps or simulation time discrepancies.
Basic Info
Description of contribution in a few bullet points
nav2::WallRate(nav2_ros_common/wall_rate.hpp): a drop-in replacement forrclcpp::WallRatethat uses the node's ROS clock (simulation time) whenuse_sim_time=true, andRCL_STEADY_TIME(monotonic clock) whenuse_sim_time=false— avoiding both sim-time blindness and NTP clock-jump issues.nav2::selectClock(node): a helper that returns the node's ROS clock when in simulation, or a steady clock otherwise.nav2::create_wall_timer(node, period, callback): a drop-in replacement forrclcpp::Node::create_wall_timerthat uses the same sim-time-aware clock selection.create_wall_timerinnav2::LifecycleNode: so that all Nav2 lifecycle nodes automatically get sim-time-aware timers without any call-site changes.nav2::WallRateinstead ofrclcpp::WallRate:ControllerServer::computeControl()Costmap2DROS::mapUpdateLoop()TimedBehavior(behaviors)WaypointFollower::followWaypointsHandler()LifecycleManager(init_timer_,bond_timer_,bond_respawn_timer_) to usenav2::create_wall_timer.LoopRate(BT execution loop) to accept an external clock and use a sim-time-aware polling loop viatree->wakeUpSignal()instead oftree->sleep(), which was hardwired to wall-clock time.rate_clock_toBehaviorTreeEngineinitialized vianav2::selectClock()and passed to theLoopRate.Dependencies
Requires BehaviorTree.CPP#1127 (
Tree::wakeUpSignal()getter).Description of documentation updates required from your changes
Description of how this change was tested
Future work that may be required in bullet points
For Maintainers:
backport-*.